Skip to content

Add some doc about interface endpoint layout#13281

Merged
gxglass merged 4 commits into
apple:mainfrom
gxglass:rpc-doc
May 29, 2026
Merged

Add some doc about interface endpoint layout#13281
gxglass merged 4 commits into
apple:mainfrom
gxglass:rpc-doc

Conversation

@gxglass
Copy link
Copy Markdown
Collaborator

@gxglass gxglass commented May 26, 2026

Generated in the process of reviewing PR #13275.

Updated to describe techniques to be used across protocol-compatible releases (probably rare to never) and techniques (interface compaction, the agent called it) that can be used across protocol-incompatible releases. Use the latter to address a latent bug from prior code removal on main.

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 7563bdd
  • Duration 0:21:49
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 7563bdd
  • Duration 0:39:37
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 7563bdd
  • Duration 0:45:40
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 7563bdd
  • Duration 1:01:25
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 7563bdd
  • Duration 1:04:23
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 7563bdd
  • Duration 1:07:15
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 7563bdd
  • Duration 1:57:21
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@gxglass gxglass requested a review from tclinkenbeard-oai May 28, 2026 16:31
Copy link
Copy Markdown
Collaborator

@tclinkenbeard-oai tclinkenbeard-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated by Codex.

What is it trying to do?

Document RPC interface endpoint layout and explain a latent CommitProxyInterface::setThrottledShard routing bug introduced when earlier endpoint fields were deleted.

Is it correct?

Partially. The anchor-plus-offset layout and the active mismatch are real, and all visible CI checks are green. However, the compatibility explanation is unsafe.

Endpoint offsets are a compatibility contract for binaries sharing a compatible protocol version. FlowTransport accepts compatible peers, not only identical builds. The cluster controller serializes ServerDBInfo, workers deserialize it using their local protocol version, and Ratekeeper consumes its CommitProxyInterface to send setThrottledShard. Mixed compatible binaries can therefore reconstruct different offsets for the same anchor.

The multiversion client does not establish identical source trees either: clients are indexed by normalizedVersion(), and compatible protocol changes keep the same connection.

I reviewed the diff and relevant public source paths only. I did not run builds or tests.

Are there bugs?

  1. The section saying indices can be repacked within a protocol version, and that legacyGetConsistentReadVersion has no compatibility reason to remain, is incorrect. Repacking requires preserving historical slots or making the protocol version incompatible. ProtocolVersions.cmake explicitly calls out communication changes.

  2. The concrete numbers are stale after PR #13275. Current main reconstructs setThrottledShard with offset 12, while initEndpoints() registers it at vector index 10. The mismatch remains real, but the documented 13 -> 11 fix is both outdated and unsafe for compatible mixed binaries.

  3. The failure is not entirely silent. An unresolved stream endpoint causes an WLTOKEN_ENDPOINT_NOT_FOUND response, which records an EndpointNotFound event. The hot-shard update is still not delivered.

Are there omissions?

The document should describe the mixed-binary compatibility boundary and recommend reserving historical slots with placeholders. A regression test should verify both local registration alignment and stable endpoint layouts across compatible versions.

Are there better ways of doing things?

Rewrite the compatibility subsection around the actual invariant: adjusted endpoint positions must remain stable for all compatible binaries. Preserve removed slots explicitly, or bump the protocol version incompatibly when intentionally repacking an interface.

Should this CL be LGTMd?

Not yet. The latent bug report is useful, but the compatibility guidance and stale remediation need correction before this should be merged.

gxglass added 3 commits May 29, 2026 12:05
… tree, replace with what seems like more accurate info. Also update misc details. Also fix the bugs with earlier refactoring rather than just talk about them.
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: f177a89
  • Duration 0:22:05
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@gxglass gxglass requested a review from tclinkenbeard-oai May 29, 2026 19:32
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: f177a89
  • Duration 0:33:44
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: f75f08d
  • Duration 0:22:00
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: f177a89
  • Duration 0:45:11
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: f177a89
  • Duration 0:46:07
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: f177a89
  • Duration 0:47:45
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: f75f08d
  • Duration 0:37:02
  • Result: ❌ FAILED
  • Error: Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: f177a89
  • Duration 0:57:40
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: f177a89
  • Duration 1:00:44
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: f75f08d
  • Duration 0:41:19
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: f75f08d
  • Duration 0:45:30
  • Result: ❌ FAILED
  • Error: Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: f75f08d
  • Duration 0:45:59
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: f75f08d
  • Duration 1:03:57
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: f75f08d
  • Duration 1:04:16
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@gxglass
Copy link
Copy Markdown
Collaborator Author

gxglass commented May 29, 2026

CI's failed with

The following tests FAILED:
         50 - fdb_c_wiggle_and_upgrade (Failed)
Errors while running CTest

and

The following tests FAILED:
         49 - fdb_c_wiggle_only (Failed)
Errors while running CTest

So I recan them. If it's this PR it should probably be deterministic because I flat out changed some RPC interfaces, which I doubt are being used, but maybe they are. However these ctests pass locally:

[root@gglass-dev-okteto-7d9784ccdf-f6xb9 build_output4]# ctest -R wiggle  
  Test project /root/build_output4                                                                                     
      Start  5: test_venv_setup                                                                                        
  1/3 Test  #5: test_venv_setup ..................   Passed    7.58 sec                                                
      Start 49: fdb_c_wiggle_only                                                                                      
  2/3 Test #49: fdb_c_wiggle_only ................   Passed   71.52 sec                                                
      Start 50: fdb_c_wiggle_and_upgrade                                                                               
  3/3 Test #50: fdb_c_wiggle_and_upgrade .........   Passed   89.52 sec                                                
                                                                                                                       
  100% tests passed, 0 tests failed out of 3                                                                           
                                                                                                                       
  Total Test time (real) = 168.63 sec                                                                                  
  [root@gglass-dev-okteto-7d9784ccdf-f6xb9 build_output4]#                                                             
  [root@gglass-dev-okteto-7d9784ccdf-f6xb9 build_output4]# ctest -R wiggle                                             
  Test project /root/build_output4                                                                                     
      Start  5: test_venv_setup                                                                                        
  1/3 Test  #5: test_venv_setup ..................   Passed    4.41 sec                                                
      Start 49: fdb_c_wiggle_only                                                                                      
  2/3 Test #49: fdb_c_wiggle_only ................   Passed   76.95 sec                                                
      Start 50: fdb_c_wiggle_and_upgrade                                                                               
  3/3 Test #50: fdb_c_wiggle_and_upgrade .........   Passed   83.50 sec                                                
                                                                                                                       
  100% tests passed, 0 tests failed out of 3                                                                           
                                                                                                                       
  Total Test time (real) = 164.87 sec                                                                                  
  [root@gglass-dev-okteto-7d9784ccdf-f6xb9 build_output4]#                

@gxglass gxglass merged commit 6ca9e16 into apple:main May 29, 2026
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants